Job Summary:
We are looking for an experienced Senior Big Data Engineer with at least 6+ years of hands-on experience in Big Data technologies, specifically Apache Spark, CouchDB, and Google Cloud Platform (GCP). This role is ideal for a highly skilled individual who thrives in a dynamic environment and is passionate about developing scalable, high-performance data solutions. The candidate should have extensive expertise in designing, implementing, and managing Big Data systems on the cloud, especially in Google Cloud Platform.
Key Responsibilities:
- Data Pipeline Development: Design and implement robust and scalable data pipelines using Apache Spark to process large volumes of structured and unstructured data efficiently in both batch and real-time processing modes.
- Spark Optimization: Optimize Spark jobs for performance, including tuning configurations, partitioning strategies, and memory management to handle complex computations over large datasets.
- CouchDB Integration: Architect, manage, and optimize CouchDB instances for efficient storage and retrieval of NoSQL data, ensuring high availability, fault tolerance, and scalability in a distributed environment.
- Cloud Infrastructure: Leverage Google Cloud Platform (GCP) services such as BigQuery, Cloud Dataproc, Cloud Functions, DataFlow, and Google Cloud Storage to build cloud-native solutions for data storage, processing, and analysis.
- Data Modeling and Management: Design data models that efficiently integrate with CouchDB and other databases. Ensure optimal schema design, indexing, and query performance to support high-volume, low-latency operations.
- Real-Time Data Processing: Utilize Apache Kafka, Spark Streaming, and GCP Pub/Sub to build systems that support real-time data ingestion and processing, enabling near-instantaneous insights and analytics.
- Cloud-Based Big Data Solutions: Implement and maintain fully automated, scalable data pipelines in the cloud using GCP Kubernetes Engine (GKE), Docker, and Terraform.
- Collaboration with Teams: Work closely with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver solutions that meet the business needs.
- Monitoring and Maintenance: Continuously monitor the performance and health of data pipelines and cloud services using tools like Google Stackdriver to ensure uptime, reliability, and prompt resolution of issues.
- Documentation & Best Practices: Maintain detailed documentation of data pipelines, infrastructure, and processes. Promote best practices for Big Data engineering, data security, and cloud cost optimizatio